首页> 外文OA文献 >Classification vs. Regression in Supervised Learning for Single Channel Speaker Count Estimation
【2h】

Classification vs. Regression in Supervised Learning for Single Channel Speaker Count Estimation

机译:单通道监督学习中的分类与回归   说话人数估计

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The task of estimating the maximum number of concurrent speakers from singlechannel mixtures is important for various audio-based applications, such asblind source separation, speaker diarisation, audio surveillance or auditoryscene classification. Building upon powerful machine learning methodology, wedevelop a Deep Neural Network (DNN) that estimates a speaker count. While DNNsefficiently map input representations to output targets, it remains unclear howto best handle the network output to infer integer source count estimates, as adiscrete count estimate can either be tackled as a regression or aclassification problem. In this paper, we investigate this important designdecision and also address complementary parameter choices such as the inputrepresentation. We evaluate a state-of-the-art DNN audio model based on aBi-directional Long Short-Term Memory network architecture for speaker countestimations. Through experimental evaluations aimed at identifying the bestoverall strategy for the task and show results for five seconds speech segmentsin mixtures of up to ten speakers.
机译:从单声道混合中估计同时发言的最大人数的任务对于各种基于音频的应用非常重要,例如盲源分离,说话者二值化,音频监视或听觉场景分类。基于强大的机器学习方法,我们开发了一个深度神经网络(DNN)来估计说话者人数。尽管DNN有效地将输入表示映射到输出目标,但仍不清楚如何最好地处理网络输出以推断整数源计数估计,因为离散计数估计可以作为回归或分类问题来解决。在本文中,我们研究了这一重要的设计决策,并解决了补充参数的选择,例如输入表示。我们评估基于双向长期短期记忆网络架构的最新DNN音频模型,以进行扬声器估计。通过旨在确定最佳总体策略的实验评估,并在多达十个说话者的混合语音中显示五秒钟语音片段的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号